change shape to support dynamic batch input in tf.function XLA generate for tf serving #18372

nlpcat · 2022-07-30T01:49:37Z

What does this PR do?

support dynamic input for tf.function + generate (XLA). needed for batch tf serving

export:

import tensorflow as tf
from transformers import TFAutoModelForSeq2SeqLM

class MyOwnModel(tf.Module):
    def __init__(self, model_path="t5-small"):
        super(MyOwnModel, self).__init__()
        self.model = TFAutoModelForSeq2SeqLM.from_pretrained(model_path)

    @tf.function(input_signature=(tf.TensorSpec((None, 32), tf.int32, name="input_ids"),
                                  tf.TensorSpec((None, 32), tf.int32, name="attention_mask")), jit_compile=True)
    def serving(self, input_ids, attention_mask):
        outputs = self.model.generate(input_ids=input_ids, attention_mask=attention_mask, max_new_tokens=32,
                                   return_dict_in_generate=True)
        return {"sequences": outputs["sequences"]}

model = MyOwnModel()
export_dir = "./"
tf.saved_model.save(
    model,
    export_dir,
    signatures={
        "serving_default":
            model.serving
    })

tf model run

import tensorflow as tf
from transformers import AutoTokenizer, TFAutoModelForSeq2SeqLM
export_dir = "./"
model = tf.saved_model.load(export_dir)

tokenizer = AutoTokenizer.from_pretrained("t5-small")
tokenization_kwargs = {"pad_to_multiple_of": 32, "padding": True, "return_tensors": "tf"}

input_prompts = [
    f"translate English to {language}: I have four cats and three dogs."
    for language in ["German", "French", "Romanian"]
]

def generate_text(inputs):
    tokenized_inputs = tokenizer(inputs, **tokenization_kwargs)
    generated_texts = model.signatures["serving_default"](**tokenized_inputs)
    for text in generated_texts["sequences"]:
        print(tokenizer.decode(text, skip_special_tokens=True))
# The first prompt will be slow (compiling), the others will be very fast!
generate_text(input_prompts[:2])
generate_text(input_prompts[:3])

xla_run

import tensorflow as tf
from transformers import AutoTokenizer, TFAutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("t5-small")
model = TFAutoModelForSeq2SeqLM.from_pretrained("t5-small")

# Main changes with respect to the original generate workflow: `tf.function` and `pad_to_multiple_of`
xla_generate = tf.function(model.generate, jit_compile=True)
tokenization_kwargs = {"pad_to_multiple_of": 32, "padding": True, "return_tensors": "tf"}

# The first prompt will be slow (compiling), the others will be very fast!
input_prompts = [
    f"translate English to {language}: I have four cats and three dogs."
    for language in ["German", "French", "Romanian"]
]
tokenized_inputs = tokenizer(input_prompts, **tokenization_kwargs)
generated_texts = xla_generate(**tokenized_inputs, max_new_tokens=32)
for text in generated_texts:
    print(tokenizer.decode(text, skip_special_tokens=True))

this also works for beam search by changing exported code as

def serving(self, input_ids, attention_mask):
        outputs = self.model.generate(input_ids=input_ids, attention_mask=attention_mask, max_new_tokens=32,
                                   return_dict_in_generate=True, num_beams=3, num_return_sequences=3)
        return {"sequences": outputs["sequences"]}

Fixes #18357
Fixes #16823

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

cc @gante @patrickvonplaten

HuggingFaceDocBuilderDev · 2022-07-30T02:00:33Z

The documentation is not available anymore as the PR was closed or merged.

nlpcat · 2022-08-03T01:03:35Z

Cc @gante @patrickvonplaten

gante · 2022-08-03T14:02:48Z

Hi @nlpcat 👋 I see the change is needed because an unknown batch size is specified (hence the need for dynamic shapes). I'm going to double-check a few cases against this branch and, if all goes well, I may propose a few changes.

In general, I'm in favor of adding the change, thank you for the PR :)

gante

Thank you for this contribution! (I've double-checked that it doesn't affect the performance of generate)

One bit is missing, if you're up to it -- a test to ensure we don't lose this feature. The best place would probably be UtilsFunctionsTest inside test_modeling_tf_common.py, and the test could be a copy of the example you shared in the PR description.

Let us know if you'd rather have us adding the test instead :)

sgugger

Yes, adding a test would be nice.
Thanks a lot for your PR!

gante · 2022-08-03T19:47:07Z

(edited the PR header to link more issues this PR fixes :) )

nlpcat · 2022-08-04T08:12:36Z

@gante @sgugger i have added the test . 596ecf4.
Can you help review and merge this PR if it looks good? Thanks.

gante · 2022-08-04T10:26:00Z

@nlpcat this is fantastic! Thank you so much for your contribution 🙏

s4sarath · 2022-08-15T05:20:13Z

The whole idea of Tensorflow in Huggingface is very complicated and a pain.
@nlpcat - You better look into

https://github.com/legacyai/tf-transformers/blob/main/docs/source/model_usage/text_generation_using_t5.ipynb

rafaellemay · 2022-12-12T21:05:35Z

I was testing this code, but I have found an issue with my model: I think the file tf_logits_process.py, also needs to use the shape_list function to support dynamic batch input.

gante · 2022-12-15T15:33:23Z

@rafaellemay can you open an issue with the problem that you found (and a snippet containing an example)? It would help us ensure the library works well in all cases :)

nlpcat marked this pull request as ready for review July 30, 2022 01:58

nlpcat force-pushed the fix.generate.batch branch from d4b873f to 59e80ca Compare July 30, 2022 06:18

nlpcat changed the title ~~change shape to support dynamic batch input in tf.function XLA generate~~ change shape to support dynamic batch input in tf.function XLA generate for tf serving Aug 1, 2022

gante approved these changes Aug 3, 2022

View reviewed changes

gante requested a review from sgugger August 3, 2022 16:46

sgugger approved these changes Aug 3, 2022

View reviewed changes

nlpcatcode added 2 commits August 4, 2022 00:41

change shape to support dynamic batch input in tf.generate

14c127f

add tests

596ecf4

nlpcat force-pushed the fix.generate.batch branch from 59e80ca to 596ecf4 Compare August 4, 2022 07:42

gante merged commit fc1d841 into huggingface:main Aug 4, 2022

piEsposito mentioned this pull request Oct 19, 2022

(TF) model.generate with beam search to an exportable tf.function #19747

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

change shape to support dynamic batch input in tf.function XLA generate for tf serving #18372

change shape to support dynamic batch input in tf.function XLA generate for tf serving #18372

Uh oh!

nlpcat commented Jul 30, 2022 •

edited by gante

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Jul 30, 2022 •

edited

Loading

Uh oh!

nlpcat commented Aug 3, 2022

Uh oh!

gante commented Aug 3, 2022 •

edited

Loading

Uh oh!

gante left a comment

Uh oh!

sgugger left a comment

Uh oh!

gante commented Aug 3, 2022

Uh oh!

nlpcat commented Aug 4, 2022

Uh oh!

gante commented Aug 4, 2022

Uh oh!

s4sarath commented Aug 15, 2022

Uh oh!

rafaellemay commented Dec 12, 2022

Uh oh!

gante commented Dec 15, 2022 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

change shape to support dynamic batch input in tf.function XLA generate for tf serving #18372

change shape to support dynamic batch input in tf.function XLA generate for tf serving #18372

Uh oh!

Conversation

nlpcat commented Jul 30, 2022 • edited by gante Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

HuggingFaceDocBuilderDev commented Jul 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nlpcat commented Aug 3, 2022

Uh oh!

gante commented Aug 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gante left a comment

Choose a reason for hiding this comment

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

gante commented Aug 3, 2022

Uh oh!

nlpcat commented Aug 4, 2022

Uh oh!

gante commented Aug 4, 2022

Uh oh!

s4sarath commented Aug 15, 2022

Uh oh!

rafaellemay commented Dec 12, 2022

Uh oh!

gante commented Dec 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

nlpcat commented Jul 30, 2022 •

edited by gante

Loading

HuggingFaceDocBuilderDev commented Jul 30, 2022 •

edited

Loading

gante commented Aug 3, 2022 •

edited

Loading

gante commented Dec 15, 2022 •

edited

Loading